Tensilica Unveils Groundbreaking Next-Generation Xtensa LX Processor Core; Industry's Highest Performance Core Will Replace RTL in SOC Designs
SANTA CLARA, Calif.—(BUSINESS WIRE)—May 18, 2004—
Tensilica(R), Inc. today unveiled its next-generation
Xtensa(R) LX configurable processor, the highest performance processor
core on the market, featuring both higher computational throughput and
dramatically higher I/O (input/output) bandwidth. This record-breaking
performance, combined with Tensilica's patented automated design and
development environment, makes Xtensa LX the only processor fast and
flexible enough to replace register transfer logic (RTL) design
methodologies in system-on-chip (SOC) designs, leading to reduced
development time and risk along with dramatic increases in ROI (return
on investment) for semiconductor and systems companies. Xtensa LX is
also ideally suited as a traditional control processor in embedded
applications. Tensilica expects that most of its customers will use
multiple Xtensa LX cores in each SOC design, each tailored to speed a
different part of the customer's application.
"With chip development costs now surging past $10 million, SOC
development teams need to reduce project development time, risk and
cost," said Chris Rowen, president and CEO of Tensilica. "With the
Xtensa LX processor, designers can configure optimized processors
specifically tuned to their application in a fraction of the time that
it takes to design and verify RTL, with comparable computational and
I/O performance. The inherent programmability of the processor gives
designers the flexibility to fix bugs and add features purely in
software at any point -- late in the design cycle or long after first
shipment. This is impossible with hard-coded RTL."
The Xtensa LX processor core features significant innovations in
four key areas:
-- Lower power, a key requirement for all SOC designs;
-- Improved I/O throughput, so the processor can move data in and
out at terabit/second speeds;
-- Improved compute performance, so the processor can process
complex algorithms much faster; and
-- Better interfaces for on-chip memories, so the processor isn't
slowed down by memory access speeds.
Tensilica supports these technical innovations with a patented
development environment that automatically and simultaneously
generates an optimized hardware implementation, a corresponding
tailored software tool chain, and a complete set of EDA models and
scripts. Configuration and extension choices made by the designer to
address requirements for a given application are immediately and
automatically reflected in the entire software tool chain. With
alternative approaches, this is typically a manual, error-prone task
that requires extensive verification.
Lower Power Consumption
Tensilica has automated the insertion of fine-grain clock gating
for every functional element of the Xtensa LX processor including
functions conceived of and created by the designer. Clock gating is a
very effective power reduction technique that turns shuts down the
power to parts of the logic that are not in use on a particular clock
cycle. Because automatic insertion of clock gating is only available
for restricted RTL design coding styles, manual, error-prone
post-layout tuning of clock circuits is often required for standard
RTL design.
The Xtensa LX processor's new architecture dramatically lowers
power consumption in large configurations with many designer-defined
functions. But even without designer modification, the Xtensa LX
processor is designed to use power very efficiently. The minimum
configuration of the Xtensa LX processor dissipates a miserly 0.05
mW/MHz in a representative 130 nm process technology. By comparison,
the smallest member of the ARM synthesizable processor family, the
ARM7TDMI-S, burns 0.11 mW/MHz in 130 nm technology -- twice the power
consumption of the Xtensa LX.
I/O Throughput Improved By Three Orders of Magnitude
Two major innovations improve I/O throughput in Xtensa LX
processors: an option for a second load/store unit and
designer-defined ports and queues.
Designers using the Xtensa LX processor can choose one or two
128-bit wide load/store units. Most standard embedded processors have
only a single narrow (32- or 64-bit) load/store unit. However, many
applications benefit from two load/store units for data-intensive
inner loops -- a standard feature of many high-end DSP processors. The
Xtensa LX processor's optional second load/store unit provides greater
sustained general-purpose I/O bandwidth and an XY-style memory access
for DSP applications. Additionally, at 128 bits, it's much wider and
can accommodate much more data than standard load/store units.
The true breakthrough in I/O is the capability to add
designer-defined ports and queues, which allow the Xtensa LX processor
to communicate as fast and as flexibly as RTL blocks. Ports are wires
that directly connect two Xtensa LX processors or an Xtensa LX
processor to external RTL. Port connections can be arbitrarily wide,
allowing wide data types to be transferred easily without the need for
multiple load/store operations. As many as one million signals (1024
1024-bit-wide ports) can be used, and while this is an outrageous
number, far exceeding the performance demands of real systems today
(providing 350 terabits/sec of direct data flow per processor in a 130
nm CMOS process), this clearly demonstrates that old notions of the
I/O bottlenecks inherent in a processor-based solution are now
obsolete.
While ports are ideal to quickly convey control and status
information, queues provide a high-speed mechanism to transfer
streaming data. Input queues and output queues operate to the
programmer's viewpoint like traditional processor registers -- with
the notable exception that data is always available without the need
to load or store the data before and after computation. Queues can
sustain data rates as high as one transfer every clock cycle or over
350 Gbits/sec for each queue added to an Xtensa LX processor. Custom
instructions can perform multiple queue operations per cycle, perhaps
combining inputs from two input queues with local data and sending the
computed values to two output queues. The high bandwidth and low
control overhead of queues allows the Xtensa LX processor to be used
in applications with extreme data rates.
Ports and queues specified by the designer are automatically added
to the Xtensa LX processor and are 100% fully modeled by Tensilica's
Xtensa Processor Generator. The full behavior of the port or queue,
just like any other modification made to the Xtensa LX processor, is
automatically reflected in the custom software development tools,
instruction set simulator, bus functional model and EDA scripts --
within about an hour. And because it's automated using Tensilica's
patented technology, it's pre-verified and correct by construction --
no need to re-verify the processor.
Improved Compute Performance
Tensilica improved compute performance in the Xtensa LX processor
through its innovative FLIX (Flexible Length Instruction Xtensions)
architecture. The FLIX architecture is a highly efficient
implementation of the Xtensa instruction set architecture (ISA) that
gives designers more options for cost/performance tradeoffs. The FLIX
technology provides the flexibility to freely and modelessly intermix
instructions of various lengths (16-, 24-, or 32-/64-bit). By packing
multiple operations into a wide 32- or 64-bit instruction word, FLIX
technology allows designers to accelerate a broader class of "hot
spots" in embedded applications. FLIX eliminates the performance and
code-size drawbacks that can occur when using a one-size-fits-all
instruction length. Compared to rigid, high-performance processor
designs that either encode only one RISC operation per instruction or
use ultra-wide 64b/128b/256b VLIW (very long instruction word)
formats, FLIX delivers high-performance concurrent execution exactly
and only when needed, yet preserves the industry leading code density
advantages of the Xtensa processor's native 16b/24b base architecture
instruction formats.
Better Interfaces to On-Chip Memories
To address the growing speed disparity between standard cell logic
and memories (memory access speeds have not scaled as well as logic in
the migration from 180 nm to 130 nm and now 90 nm), the Xtensa LX
processor features a configurable pipeline. Designers can select two
additional clock cycles for memory access if required by the
application. While Tensilica's traditional 5-stage pipeline is very
efficient for many applications, designers employing very large local
memories or low-power memories with slower access speeds will find
advantages in moving to a longer pipeline, resulting in a higher
system clock frequency.
Leading Benchmark Scores
In addition to being the ideal alternative methodology for
hardware block design, the Xtensa LX processor excels at traditional
CPU and DSP tasks in embedded SOCs as demonstrated by industry leading
benchmark results on the EEMBC (Embedded Microprocessor Benchmark
Consortium) Consumer benchmark suite and the BDTI Benchmarks(TM) by
Berkeley Design Technology, Inc. (BDTI).
The EEMBC Consumer benchmark "out of the box" score was 171.6 @
330 MHz (0.51997 per MHz), nearly a 9X performance advantage over the
ARM1020E. See separate press release issued today titled, "Tensilica's
Xtensa LX Processor Beats All Other 32- and 64-bit Processor Cores on
EEMBC Consumer "Out of the Box" Scores."
The Xtensa LX BDTIsimMark2000(TM) score of 6150 for a 370 MHz
configuration is 70% faster than the score for the next-fastest
licensable core benchmarked by BDTI, the CEVA-X1620.(1) See separate
press release issued today titled, "Tensilica's New Xtensa LX
Processor Earns Top BDTIsimMark2000(TM) Score."
Specifications
The base Xtensa LX processor consumes approximately 27,500 gates
when synthesized for minimum power and area, and achieves 350 MHz
(worst case conditions) in TSMC's 130 nm LV process technology when
optimized for speed. In 90nm technology, the 7-stage version of Xtensa
can achieve over 500 MHz.
Pricing and Availability
Tensilica's pricing structure is based on a licensing fee per
processor instance plus royalties based on the volume of processors
manufactured. Each licensed processor instance can be targeted to any
silicon foundry technology. Licensing fees for a single processor
configuration start at $550,000 for the Xtensa LX processor including
the Vectra LX DSP engine. The Xtensa Software Developers Toolkit,
which includes the Xtensa Xplorer development environment, Xtensa
C/C++ compiler, and Xtensa Instruction Set Simulator; and TIE Compiler
are priced separately. Customers can begin to take advantage of the
new features of the Xtensa LX processor early this summer.
Xtensa LX is an addition to the Tensilica processor family, which
includes the proven Xtensa V configurable processor. Customers will be
able to continue to license the Xtensa V processor. The Xtensa V
processor and the Xtensa LX processor both implement the common core
Xtensa instruction set.
About Tensilica
Tensilica was founded in July 1997 to address the growing need for
optimized, application-specific microprocessors for high-volume
embedded applications. With the Xtensa and Xtensa LX configurable and
extensible microprocessor cores, Tensilica is the only company that
has automated and patented the time-consuming process of generating a
customized microprocessor core along with a complete
software-development tool environment, producing new configurations in
a matter of hours. These customized processors rival hand-coded RTL in
performance and add a needed level of programmability. For more
information, visit www.tensilica.com.
(1) The BDTIsimMark2000(TM) provides a summary measure of DSP
speed. For more information and scores see www.BDTI.com. Scores (C)
2004 BDTI. The Xtensa LX score includes use of 12 custom TIE
instructions that expand the area of the core by 16%. Licensees may
require greater or lesser degrees of customization. The scores for all
other cores assume that no coprocessors or other customizations were
used. The scores for the Xtensa LX and all other cores are for worst
case operating conditions in a commercially available 130 nm process.
Contact info@BDTI.com for more information.
Editors' Notes:
-- Tensilica and Xtensa are registered trademarks belonging to
Tensilica Inc.
-- BDTI Benchmarks and BDTIsimMark2000 are trademarks of Berkeley
Design Technology, Inc.
-- Tensilica's announced licensees include Agilent, AMCC (JNI
Corporation), Astute Networks, Avision, Bay Microsystems,
Berkeley Wireless Research Center, Broadcom, Cisco Systems,
Conexant Systems, Cypress, Crimson Microsystems, ETRI,
FUJIFILM Microdevices, Fujitsu Ltd., Hudson Soft, Hughes
Network Systems, Ikanos Communications, LG Electronics,
Marvell, MediaWorks, NEC Laboratories America, NEC
Corporation, Nippon Telephone and Telegraph (NTT), Olympus
Optical Co. Ltd., S2io, Solid State Systems, Sony,
STMicroelectronics, TranSwitch Corporation, and Victor Company
of Japan (JVC).
Contact:
Tensilica
Paula Jones, 408-327-7343
paula@tensilica.com
or
Joany Draeger, 650-365-3395
joany@taniscomm.com